77 research outputs found
Digital Twins in Solar Farms: An Approach through Time Series and Deep Learning
The generation of electricity through renewable energy sources increases every day, with solar energy being one of the fastest-growing. The emergence of information technologies such as Digital Twins (DT) in the field of the Internet of Things and Industry 4.0 allows a substantial development in automatic diagnostic systems. The objective of this work is to obtain the DT of a Photovoltaic Solar Farm (PVSF) with a deep-learning (DL) approach. To build such a DT, sensor-based time series are properly analyzed and processed. The resulting data are used to train a DL model (e.g., autoencoders) in order to detect anomalies of the physical system in its DT. Results show a reconstruction error around 0.1, a recall score of 0.92 and an Area Under Curve (AUC) of 0.97. Therefore, this paper demonstrates that the DT can reproduce the behavior as well as detect efficiently anomalies of the physical system.This project has been funded by the Ministry of Economy and Commerce with project
contract TIN2016-88835-RET and by the Universitat Jaume I with project contract UJI-B2020-15
Statistically-driven generation of multidimensional analytical schemas from linked data
The ever-increasing Linked Data (LD) initiative has given place to open, large amounts of semi-structured and rich data published on the Web. However, effective analytical tools that aid the user in his/her analysis and go beyond browsing and querying are still lacking. To address this issue, we propose the automatic generation of multidimensional analytical stars (MDAS). The success of the multidimensional (MD) model for data analysis has been in great part due to its simplicity. Therefore, in this paper we aim at automatically discovering MD conceptual patterns that summarize LD. These patterns resemble the MD star schema typical of relational data warehousing. The underlying foundations of our method is a statistical framework that takes into account both concept and instance data. We present an implementation that makes use of the statistical framework to generate the MDAS. We have performed several experiments that assess and validate the statistical approach with two well-known and large LD sets.This
research
has
been
partially
funded
by
the
“Ministerio
de
EconomĂa
y
Competitividad” with
contract
number
TIN2014-55335-R.
Victoria
Nebot
was
supported
by
the
UJI
Postdoctoral
Fel-
lowship
program
with
reference
PI14490
Semantic transference for enriching multilingual biomedical knowledge resources
Biomedical knowledge resources (KRs) are mainly expressed in English, and many applications using them suffer from the scarcity of knowledge in non- English languages. The goal of the present work is to take maximum profit from existing multilingual biomedical KRs lexicons to enrich their non-English counterparts. We propose to combine different automatic methods to gener- ate pair-wise language alignments. More specifically, we use two well-known translation methods (GIZA++ and Moses), and we propose a new ad-hoc method specially devised for multilingual KRs. Then, resulting alignments are used to transfer semantics between KRs across their languages. Transfer- ence quality is ensured by checking the semantic coherence of the generated alignments. Experiments have been carried out over the Spanish, French and German UMLS Metathesaurus counterparts. As a result, the enriched Span- ish KR can grow up to 1,514,217 concepts (originally 286,659), the French KR up to 1,104,968 concepts (originally 83,119), and the German KR up to 1,136,020 concepts (originally 86,842)
Exploiting semantic annotations for open information extraction: an experience in the biomedical domain
The increasing amount of unstructured text published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on harvesting domain-specific, pre-specified relations, which usually requires manual labor and heavy machinery; especially in the biomedical domain, the main efforts have been directed toward the recognition of well-defined entities such as genes or proteins, which constitutes the basis for extracting the relationships between the recognized entities. The intrinsic features and scale of the Web demand new approaches able to cope with the diversity of documents, where the number of relations is unbounded and not known in advance. This paper presents a scalable method for the extraction of domain-independent relations from text that exploits the knowledge in the semantic annotations. The method is not geared to any specific domain (e.g., protein–protein interactions and drug–drug interactions) and does not require any manual input or deep processing. Moreover, the method uses the extracted relations to compute groups of abstract semantic relations characterized by their signature types and synonymous relation strings. This constitutes a valuable source of knowledge when constructing formal knowledge bases, as we enable seamless integration of the extracted relations with the available knowledge resources through the process of semantic annotation. The proposed approach has successfully been applied to a large text collection in the biomedical domain and the results are very encouraging.The work was supported by the CICYT project TIN2011-24147 from the Spanish Ministry of Economy and Competitiveness (MINECO)
Building Data Warehouses with Semantic Web Data
The Semantic Web (SW) deployment is now a realization and the amount of
semantic annotations is ever increasing thanks to several initiatives that promote
a change in the current Web towards the Web of Data, where the semantics of
data become explicit through data representation formats and standards such as
RDF/(S) and OWL. However, such initiatives have not yet been accompanied
by e cient intelligent applications that can exploit the implicit semantics and
thus, provide more insightful analysis. In this paper, we provide the means for
e ciently analyzing and exploring large amounts of semantic data by combining
the inference power from the annotation semantics with the analysis capabilities
provided by OLAP-style aggregations, navigation, and reporting. We formally
present how semantic data should be organized in a well-de ned conceptual
MD schema, so that sophisticated queries can be expressed and evaluated. Our
proposal has been evaluated over a real biomedical scenario, which demonstrates
the scalability and applicability of the proposed approach
Defining Dynamic Indicators for Social Network Analysis: A Case Study in the Automotive Domain using Twiter
Comunicación pesentada en 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (KDIR 2018) (18-20 septiembre Sevilla, España)In this paper we present a framework based on Linked Open Data Infrastructures to perform analysis tasks in social networks based on dynamically defined indicators. Based on the typical stages of business intelligence models, which starts from the definition of strategic goals to define relevant indicators (Key Performance Indicators), we propose a new scenario where the sources of information are the social networks. The fundamental contribution of this work is to provide a framework for easily specifying and monitoring social indicators based on the measures offered by the APIs of the most important social networks. The main novelty of this method is that all the involved data and information is represented and stored as Linked Data. In this work we demonstrate the benefits of using linked open data, especially for processing and publishing company-specific social metrics and indicators
Designing Similarity Measures for XML
In this demonstration we will show a series of tools that
support a methodology [1] for the design of complex similarity functions
in the context of heterogenous XML systems
XTaGe: a flexible generation system for complex XML collections
We introduce XTaGe (XML Tester and Generator), a system for the synthesis of XML collections meant for testing and micro benchmarking applications. In contrast with existing approaches, XTaGe focuses on complex collections, by providing a highly extensible framework to introduce controlled variability in XML structures. In this paper we present the theoretical foundation, internal architecture and main features of our generator; we describe its implementation, which includes a GUI to facilitate the specication of collections; we discuss how XTaGe's features compare with those in other XML generation systems; finally, we illustrate its usage by presenting a use case in the bioinformatics domai
In the pursuit of a semantic similarity metric based on UMLS annotations for articles in PubMed Central
Motivation
Although full-text articles are provided by the publishers in electronic formats, it remains a challenge to find related work beyond the title and abstract context. Identifying related articles based on their abstract is indeed a good starting point; this process is straightforward and does not consume as many resources as full-text based similarity would require. However, further analyses may require in-depth understanding of the full content. Two articles with highly related abstracts can be substantially different regarding the full content. How similarity differs when considering title-and-abstract versus full-text and which semantic similarity metric provides better results when dealing with full-text articles are the main issues addressed in this manuscript.
Methods
We have benchmarked three similarity metrics – BM25, PMRA, and Cosine, in order to determine which one performs best when using concept-based annotations on full-text documents. We also evaluated variations in similarity values based on title-and-abstract against those relying on full-text. Our test dataset comprises the Genomics track article collection from the 2005 Text Retrieval Conference. Initially, we used an entity recognition software to semantically annotate titles and abstracts as well as full-text with concepts defined in the Unified Medical Language System (UMLS®). For each article, we created a document profile, i.e., a set of identified concepts, term frequency, and inverse document frequency; we then applied various similarity metrics to those document profiles. We considered correlation, precision, recall, and F1 in order to determine which similarity metric performs best with concept-based annotations. For those full-text articles available in PubMed Central Open Access (PMC-OA), we also performed dispersion analyses in order to understand how similarity varies when considering full-text articles.
Results
We have found that the PubMed Related Articles similarity metric is the most suitable for full-text articles annotated with UMLS concepts. For similarity values above 0.8, all metrics exhibited an F1 around 0.2 and a recall around 0.1; BM25 showed the highest precision close to 1; in all cases the concept-based metrics performed better than the word-stem-based one. Our experiments show that similarity values vary when considering only title-and-abstract versus full-text similarity. Therefore, analyses based on full-text become useful when a given research requires going beyond title and abstract, particularly regarding connectivity across articles.
Availability
Visualization available at ljgarcia.github.io/semsim.benchmark/, data available at http://dx.doi.org/10.5281/zenodo.13323.The authors acknowledge the support from the members of Temporal Knowledge Bases Group at Universitat Jaume I. Funding: LJGC and AGC are both self-funded, RB is funded by the “Ministerio de EconomĂa y Competitividad” with contract number TIN2011-24147
A framework for obtaining structurally complex condensed representations of document sets in the biomedical domain
En este art culo presentamos un marco para la obtenci on de representa-
ciones condensadas estructuralmente complejas de conjuntos de documentos, el cual
servir a de base para la construcci on de res umenes, la obtenci on de respuestas para
preguntas complejas, etc. Este marco incluye un m etodo para extraer una lista
ordenada de hechos, triplos de la forma entidad - relaci on - entidad, el cual usa
patrones de extracci on basados en an alisis de dependencias y modelos de lenguajes;
y m etodos para construir un grafo bipartito que codi que la informaci on contenida
en el conjunto de hechos y determinar un orden de recorrido apropiado sobre dicha
estructura. Evaluamos los componentes de nuestro marco sobre una subcolecci on
extra da de MEDLINE. Los resultados obtenidos son prometedores.In this paper, we present a framework for obtaining structurally complex
condensed representations of documents sets, which will be used as a base for sum-
marization, answering complex questions, etc. This framework includes a method
for extracting a ranked list of facts, triples of the form entity - relation - entity, which
relies on dependency parsing-based extraction patterns and language modeling; and
methods for constructing a bipartite graph encoding the information contained in
the set of facts and determining an appropriate traversing order on that structure.
We evaluate the components of our framework on a subcollection extracted from
MEDLINE, obtaining promising results
- …